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tn  summary,  this  DOD  agreement  not  only  allowed  us  to  perform  a  comprehensive  study  on  data  stream  mining  and  its  promising 
application  on  biomedical  domains,  but  also  foster  and  enrich  the  research  experiences  of  the  under-represented  minority  students  at  Xavier, 
and  open  opportunities  for  them  in  graduate  schools  or  future  careers  in  IT  industries.  Moreover,  this  support  has  motivated  us  to  explore 
other  important  aspects  of  mining  streaming  data,  such  as  anomaly  or  outlier  detection,  which  is  worth  more  years  of  further  investigation  as 
described  in  the  Pi’s  proposal  submitted  to  the  “Fiscal  Year  2015  Department  of  Defense  Research  and  Education  Program  for  Historically 
Black  Colleges  and  Universities  and  Minority-Serving  Institutions  (HBCU/MI)”.  Below  are  the  key  statistics  achieved  by  our  project. 
Peer-reviewed  publications:  1 1  from  the  Pi’s  lab  and  42  from  the  Co-PI’s  site 
Trained  minority  undergrads:  20 

#  of  graduates:  11;  4  working  in  IT  industry  and  2  attending  graduate  school 
Developed  new  courses:  2 


Enter  List  of  papers  submitted  or  published  that  acknowledge  ARO  support  from  the  start  of 
the  project  to  the  date  of  this  printing.  List  the  papers,  including  journal  references,  in  the 
following  categories: 

(a)  Papers  published  in  peer-reviewed  journals  (N/A  for  none) 


Received  Paper 


05/13/2015  19.00  Wensheng  Zhang,  Andrea  Edwards,  Erik  Flemington,  Kun  Zhang,  Shannon  M.  Hawkins.  Somatic 
Mutations  Favorable  to  Patient  Survival  Are  Predominant  in  Ovarian  Carcinomas, 

PLoS  ONE,  (11  2014):  1.  doi:  10. 1371/journal. pone. 01 12561 

08/27/2012  3.00  Wensheng  Zhang,  Andrea  Edwards,  Wei  Fan,  Erik  K.  Flemington,  Kun  Zhang.  miRNA-mRNA  Correlation- 
Network  Modules  in  Human  Prostate  Cancer  and  the  Differences  between  Primary  and  Metastatic  Tumor 
Subtypes, 

PLoS  ONE,  (06  2012):  0.  doi:  10. 1371/journal. pone. 0040130 

08/29/2013  5.00  S.  Yang,  S.  Pounds,  Kun.  Zhang,  Z.  Fang.  PAIR:  paired  allelic  log-intensity-ratio-based  normalization 
method  for  SNP-CGH  arrays, 

Bioinformatics,  (11  2012):  0.  doi:  10.1093/bioinformatics/bts683 

08/29/2013  6.00  Zhide  Fang,  Ruofei  Du,  Andrea  Edwards,  Erik  K.  Flemington,  Kun  Zhang,  Yan  Gong.  The  Sequence 
Structures  of  Human  MicroRNA  Molecules  and  Their  Implications, 

PLoS  ONE,  (01  2013):  0.  doi:  10. 1371/journal. pone. 0054215 

08/30/2014  12.00  Xiaoxiao  Shi,  Jean-Francois  Paiement,  David  Grangier,  Philip  S.  Yu.  GBC:  Gradient  boosting  consensus 
model  for  heterogeneous  datat, 

,  (06  2014):  0.  doi:  10.1002/sam.l  1 193 

08/30/2014  13.00  Andrea  Edwards,  Wei  Fan,  Wensheng  Zhang,  Zhide  Fang,  Prescott  Deininger,  Kun  Zhang.  Inferring  the 
expression  variability  of  human  transposable  element-derived  exons  by  linear  model  analysis  of  deep 
RNA  sequencing  data, 

BMC  Genomics,  (08  2013):  0.  doi:  10.1186/1471-2164-14-584 

08/30/2014  14.00  Wensheng  Zhang,  Andrea  Edwards,  Erik  K.  Flemington,  Kun  Zhang,  Peter  Csermely.  Inferring 

Polymorphism-Induced  Regulatory  Gene  Networks  Active  in  Human  Lymphocyte  Cell  Lines  by  Weighted 
Linear  Mixed  Model  Analysis  of  Multiple  RNA-Seq  Datasets, 

PLoS  ONE,  (10  2013):  0.  doi:  10. 1371/journal. pone. 0078868 

08/30/2014  15.00  Bo  Liu,  Yanshan  Xiao,  Philip  S.  Yu,  Zhifeng  Hao,  Longbing  Cao.  An  Efficient  Approach  for  Outlier 
Detection  with  Imperfect  Data  Labels, 

IEEE  Transactions  on  Knowledge  and  Data  Engineering,  (07  2014):  0.  doi:  10.1 109/TKDE.2013. 108 

08/30/2014  16.00  Bo  Liu,  Yanshan  Xiao,  Philip  S.  Yu,  Longbing  Cao,  Yun  Zhang,  Zhifeng  Hao.  Uncertain  One-Class 
Learning  and  Concept  Summarization  Learning  on  Uncertain  Data  Streams, 

IEEE  Transactions  on  Knowledge  and  Data  Engineering,  (02  2014):  0.  doi:  10.1 109/TKDE.2012. 235 

TOTAL:  9 


Number  of  Papers  published  in  peer-reviewed  journals: 

(b)  Papers  published  in  non-peer-reviewed  journals  (N/A  for  none) 


Received  Paper 


TOTAL: 


Number  of  Papers  published  in  non  peer-reviewed  journals: 


(c)  Presentations 


Number  of  Presentations:  4.00 


Non  Peer-Reviewed  Conference  Proceeding  publications  (other  than  abstracts): 


Received  Paper 


TOTAL: 


Number  of  Non  Peer-Reviewed  Conference  Proceeding  publications  (other  than  abstracts): 


Peer-Reviewed  Conference  Proceeding  publications  (other  than  abstracts): 


Received  Paper 


05/13/2015  18.00  Ke  Wu,  Kun  Zhang,  Wei  Fan,  Andrea  Edwards,  Philip  S.  Yu.  RS-Forest:  A  Rapid  Density  Estimator  for 
Streaming  Anomaly  Detection, 

2014  IEEE  International  Conference  on  Data  Mining  (ICDM).  13-DEC-14,  Shenzhen,  China.  :  , 

05/13/2015  23.00  Wensheng  Zhang,  ,  Andrea  Edwards,  ,  Prescott  Deininger,  ,  Kun  Zhang.  The  Duplication  and  Intragenic 
Domain  Expansion  of  Human  C2H2  Zinc  Finger  Genes  Are  Associated  with  Transposable  Elements  and 
Relevant  to  the  Expression-based  Clustering, 

BICoB-2015.  05-MAR-15,  .  :  , 

05/13/2015  22.00  Lifang  He,  Xiangnan  Kong,  Philip  S.  Yu,  Zhifeng  Hao,  Bokai  Cao,  Ann  B.  Ragin.  Tensor-Based  Multi-view 
Feature  Selection  with  Applications  to  Brain  Diseases, 

2014  IEEE  International  Conference  on  Data  Mining  (ICDM).  13-DEC-14,  Shenzhen,  China.  :  , 

05/13/2015  21.00  Bokai  Cao,  Xiangnan  Kong,  Philip  S.  Yu.  Collective  Prediction  of  Multiple  Types  of  Links  in 
Heterogeneous  Information  Networks, 

2014  IEEE  International  Conference  on  Data  Mining  (ICDM).  13-DEC-14,  Shenzhen,  China.  :  , 

05/13/2015  20.00  Jiawei  Zhang,  Philip  S.  Yu,  Zhi-Hua  Zhou.  Meta-path  based  multi-network  collective  link  prediction, 
the  20th  ACM  SIGKDD  international  conference.  23-AUG-14,  New  York,  New  York,  USA.  :  , 

08/25/2012  i  .00  Sihong  Xie,  Guan  Wang,  Shuyang  Lin,  Philip  S.  Yu.  Review  Spam  Detection  via  Temporal  Pattern 
Discovery, 

the  18th  ACM  SIGKDD  International  Conference  on  Knowledge  Discovery  &  Data  Mining.  12-AUG-12,  .  : 


08/25/2012  2.00  Guan  Wang,  Yuchen  Zhao,  Xiaoxiao  Shi ,  Philip  S.  Yu.  Magnet  Community  Identification  on  Social 
Networks, 

the  18th  ACM  SIGKDD  International  Conference  on  Knowledge  Discovery  &  Data  Mining.  12-AUG-12,  .  : 


08/29/2013  4.00  jing  Peng,  Kun  Zhang.  A  Margin  Technique  for  Dimension  Reduction  with  Applications  to  Hyperspectrai 
Imagery, 

International  Conference  on  Advanced  Computer  Science  and  Electronics  Information  (ICACSEI  2013). 
20-MAY-1 3,  .  :  , 

08/29/2013  7.00  Yanshan  Xiao,  Philip  S.  Yu,  Zhifeng  Hao,  Bo  Liu.  MODS:  Multiple  One-class  Data  Streams  Learning 
from  Homogeneous  Data, 

SIAM  Data  Mining  Conference,  2013.  02-MAY-13,  .  :  , 

08/29/2013  8.00  Yuchen  ZhaoL ,  Philip  S.  Yu.  On  Graph  Stream  Clustering  with  Side  Information, 

SIAM  Data  Mining  Conference,  May  2013.  02-MAY-13,  .  :  , 

08/29/2013  g.oo  Bo  Liu,  Yanshan  Xiaoy  ,  Philip  S.  Yu,  Longbing  Cao,  Zhifeng  Hao.  Robust  Textual  Data  Streams  Mining 
Based  on  Continuous  Transfer  Learning, 

SIAM  Data  Mining  Conference,  2013.  02-MAY-13,  .  :  , 

08/29/2013  10.00  Chang-Dong  Wang,  Jian-Huang  Lai,  Philip  S.  Yu.  Dynamic  Community  Detection  in  Weighted  Graph 
Streams, 

SIAM  Data  Mining  Conference,  2013.  02-MAY-13,  .  :  , 


08/29/2013  11.00  Xiaoxiao  Shi L ,  Philip  Yu.  Dimensionality  Reduction  on  Heterogeneous  Feature  Space, 

IEEE  Inti.  Conf.  on  Data  Mining,  2012..  10-DEC-12,  .  :  , 

08/30/2014  17.00  Ke  Wu L ,  Andrea  Edwards,  LWei  Fan,  Jing  Gao,  Kun  Zhang L.  2.  Classifying  Imbalanced  Data  Streams 
via  Dynamic  Feature  Group  Weighting  with  Importance  Sampling  , 
the  14th  SIAM  International  Conference  on  Data  Mining.  24-APR-14,  .  :  , 

TOTAL:  14 


Number  of  Peer-Reviewed  Conference  Proceeding  publications  (other  than  abstracts): 


(d)  Manuscripts 


Received  Paper 


TOTAL: 


Number  of  Manuscripts: 


Books 


Received  Book 


TOTAL: 


Received  Book  Chapter 


TOTAL: 


Patents  Submitted 


Patents  Awarded 


Awards 


Graduate  Students 

NAME 

PERCENT  SUPPORTED 

FTE  Equivalent: 

Total  Number: 

Names  of  Post  Doctorates 


NAME 

PERCENT  SUPPORTED 

FTE  Equivalent: 

Total  Number: 

Names  of  Faculty  Supported 


NAME 

PERCENT  SUPPORTED 

FTE  Equivalent: 

Total  Number: 

Names  of  Under  Graduate  students  supported 


NAME 

PERCENT  SUPPORTED 

Discipline 

Chris  chance 

0.20 

CS 

Chris  Cosey 

0.20 

CS 

Brittney  Mack 

0.20 

CS 

Tuan  Nguyen 

0.10 

CS 

Milton  Torrey 

0.10 

CS 

Wesley  Walker 

0.30 

CS 

FTE  Equivalent: 

1.10 

Total  Number: 

6 

Student  Metrics 

This  section  only  applies  to  graduating  undergraduates  supported  by  this  agreement  in  this  reporting  period 

The  number  of  undergraduates  funded  by  this  agreement  who  graduated  during  this  period: .  1 1 .00 

The  number  of  undergraduates  funded  by  this  agreement  who  graduated  during  this  period  with  a  degree  in 

science,  mathematics,  engineering,  or  technology  fields: . 1 1 .00 

The  number  of  undergraduates  funded  by  your  agreement  who  graduated  during  this  period  and  will  continue 
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Summary  of  Project  Objective 

From  January  2012  to  January  2015,  Dr.  Kun  Zhang  (the  PI)  at  Xavier  University  of  Louisiana 
and  Dr.  Philip  Yu  (the  Co-PI)  of  the  University  of  Illinois  at  Chicago  received  grant  funding 
(W91  INF-12-1-0066)  from  the  US  Army  Research  Office  to  develop  “An  Integrated  Framework 
to  Access  and  Mine  Distributed  Heterogeneous  Data  Streams”. 

The  objective  of  the  proposed  research  is  twofold.  First,  from  the  scientific  aspect,  we  aimed  to 
develop  a  distributed,  optimization-based,  robust  data  stream  mining  framework  to  assist  in  the 
battlefield  decision-making.  By  intelligently  collecting  and  analyzing  real-time,  diversified  and 
uncertain  streaming  data,  multi-source  heterogeneous  information,  as  well  as  constraints  as 
dictated  by  previous  experience  and  strategic  policies,  the  developed  framework  is  expected  to 
produce  the  winning  strategies  as  the  synthesis  of  dynamic  information  gathering  and  decision¬ 
making  optimized  over  a  pre-specified  loss  function,  with  constraints  imposed  on  interacting 
sub-components  in  each  single  step.  Second,  from  the  educational  perspective,  we  intended  to 
offer  a  vital  opportunity  to  involve  African-American  students  in  modem  computational  research. 
Educational  goals  of  initiating  “data  mining”  and  “mining  data  streams”  classes  at  the 
undergraduate  level,  and  recruiting,  advising  and  training  undergraduate  researchers  are  also 
integral  to  the  proposed  project. 

Project  Accomplishments 

In  order  to  achieve  this  objective,  the  collaborative  team  conducted  a  very  active  research  with 
53  peer-reviewed  publications.  More  specifically,  the  Pi’s  work  on  this  project  includes: 

•  The  definition  of  a  margin  technique  for  dimension  reduction  with  applications  to 
hyperspectral  imagery  [  1  ] . 

•  The  design  of  an  importance  sampling  driven,  dynamic  feature  group  weighting 
framework  (DFGW-IS)  for  classifying  data  streams  of  imbalanced  distribution  [2], 

•  The  development  of  a  novel  one-class,  semi-supervised  algorithm  to  detect  anomalies  in 
streaming  data  [3], 

•  By  extending  the  above  techniques,  the  PI  also 

o  discovered  miRNA-mRNA  correlation-network  modules  in  human  prostate 
cancer  and  revealed  the  differences  between  primary  and  metastatic  tumor 
subtypes  [4]; 

o  developed  an  allelic  log-intensity-ratio  based  nonnalization  method  for  SNP-CGH 
arrays  [5]; 

o  explored  the  sequence  structures  of  human  MicroRNA  molecules  and  their 
implications  [6]; 

o  inferred  the  expression  variability  of  human  transposable  element-derived  exons 
by  linear  model  analysis  of  deep  RNA  sequencing  data[7]; 
o  constructed  polymorphism-induced  regulatory  gene  networks  active  in  human 
lymphocyte  cell  lines  by  weighted  linear  mixed  model  analysis  of  multiple  RNA- 
Seq  datasets[8]; 


o  revealed  the  miRNA-mediated  relationships  between  Cis-SNP  genotypes  and 
transcript  intensities  in  lymphocyte  cell  lines  [9]; 
o  identified  that  somatic  mutations  favorable  to  patient  survival  are  predominant  in 
ovarian  carcinomas  [10]; 

o  investigated  the  relationship  between  transposable  elements  and  the  duplication 
and  intragenic  domain  expansion  of  human  C2H2  Zinc  finger  genes  [11]. 

On  the  side  of  Co-PI,  the  representative  work  includes  “GBC:  Gradient  Boosting  Consensus 
Model  for  Heterogeneous  Data”  [12],  “An  Efficient  Approach  for  Outlier  Detection  with 
Imperfect  Data  Labels”  [13],  “Uncertain  One-Class  Learning  and  Concept  Summarization 
Learning  on  Uncertain  Data  Streams”  [14],  "Collective  Prediction  of  Multiple  Types  of  Links  in 
Heterogeneous  Information  Networks"  [15],  and  the  other  38  studies  [16-53]. 

Prom  2012  to  2015,  funded  by  this  agreement,  20  Xavier  STEM  undergraduate  researchers 
were  involved  in  the  proposed  project.  95%  of  those  students  were  African  Americans,  and  25% 
of  them  were  female.  During  this  period,  we  had  11  undergraduates  who  graduated.  Pour  of 
those  graduates  went  to  work  in  the  IT  industry,  such  as  Microsoft  and  IBM;  and  two  of  them  are 
attending  graduate  schools  to  pursue  a  Master  or  Ph.D  in  computer  science  or  infonnation 
system.  Please  refer  to  Appendix  1  for  the  details  of  the  funded  students. 

In  addition,  the  PI  also  introduced  two  new  courses  into  the  curriculum.  One  is  the  data  mining 
course  with  the  target  audience  being  junior  and  senior  major  students,  and  the  other  is  an 
interdisciplinary  course  titled  “CPSC2900  Introduction  to  Bioinformatics  Programming”. 
Designed  for  students  with  minimal  a  priori  programming  experience,  this  freshman-level  course 
aims  to  offering  the  fundamental  bioinfonnatics  programming  skills  necessary  to  exploit  the 
abundance  of  biological  data.  Both  courses  have  been  fully  developed  and  are  offered  in  spring 
2015.  Sample  lectures  and  problem  sets  were  also  given  to  the  STEM  students  in  the  summer 
research  seminars  with  positive  feedback. 

In  summary,  this  DOD  agreement  not  only  allowed  us  to  perform  a  comprehensive  study  on  data 
stream  mining  and  its  promising  application  on  biomedical  domains,  but  also  foster  and  enrich 
the  research  experiences  of  the  under-represented  minority  students  at  Xavier,  and  open 
opportunities  for  them  in  graduate  schools  or  future  careers  in  IT  industries.  Moreover,  this 
support  has  motivated  us  to  explore  other  important  aspects  of  mining  streaming  data,  such  as 
anomaly  or  outlier  detection,  which  is  worth  more  years  of  further  investigation  as  described  in 
the  Pi’s  proposal  submitted  to  the  “Piscal  Year  2015  Department  of  Defense  Research  and 
Education  Program  for  Historically  Black  Colleges  and  Universities  and  Minority-Serving 
Institutions  (HBCU/MI)”. 
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